data sanitization in association rule mining based on impact factor
نویسندگان
چکیده
data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. it transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved against association rule mining method. this process strongly rely on the minimizing the impact of data sanitization on the data utility by minimizing the number of lost patterns in the form of non-sensitive patterns which are not mined from sanitized database. this study proposes a data sanitization algorithm to hide sensitive patterns in the form of frequent itemsets from the database while controls the impact of sanitization on the data utility using estimation of impact factor of each modification on non-sensitive itemsets. the proposed algorithm has been compared with sliding window size algorithm (swa) and max-min1 in term of execution time, data utility and data accuracy. the data accuracy is defined as the ratio of deleted items to the total support values of sensitive itemsets in the source dataset. experimental results demonstrate that proposed algorithm outperforms swa and max-min1 in terms of maximizing the data utility and data accuracy and it provides better execution time over swa and max-min1 in high scalability for sensitive itemsets and transactions.
منابع مشابه
Data sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملData sanitization in association rule mining based on impact factor
Data sanitization process is used to promote the sharing of transactional databases among organizations and businesses, and alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved against ass...
متن کاملGeneralized Association Rule Mining Algorithms Based on Multidimensional Data
This paper proposes a new formalized definition of generalized association rule based on Multidimensional data. The algorithms named BorderLHSs and GenerateLHSs-Rule are designed for generating generalized association rule from multi-level frequent item sets based on Multidimensional Data. Experiment shows that the algorithms proposed in this paper are more efficiency, generate less redundant r...
متن کاملAssociation Rule Mining on Distributed Data
Applications requiring large data processing, have two major problems, one a huge storage and its management and second processing time, as the amount of data increases. Distributed databases solve the first problem to a great extent but second problem increases. Since, current era is of networking and communication and people are interested in keeping large data on networks, therefore, researc...
متن کاملPrivacy Preserving Association Rule Mining based on the Intersection Lattice and Impact Factor of Items
Association Rules revealed by association rule mining may contain some sensitive rules, which may cause prospective threats towards privacy and protection. A number of researchers in this area have recently made efforts to preserve privacy for sensitive association rules in transactional databases. In this paper, we put forward a heuristic based association rule hiding algorithm to get rid of t...
متن کاملAssociation Rule Mining Based On Trade List
In this paper a new mining algorithm is defined based on frequent item set. Apriori Algorithm scans the database every time when it finds the frequent item set so it is very time consuming and at each step it generates candidate item set. So for large databases it takes lots of space to store candidate item set .In undirected item set graph, it is improvement on apriori but it takes time and sp...
متن کاملمنابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
journal of ai and data miningناشر: shahrood university of technology
ISSN 2322-5211
دوره 3
شماره 2 2015
میزبانی شده توسط پلتفرم ابری doprax.com
copyright © 2015-2023